262 research outputs found

    Exploring textual data (book review)

    Get PDF
    Book Review

    Register as a predictor of linguistic variation

    Get PDF
    Over the last two decades, corpus analysis has been used as the basis for several important reference grammars and dictionaries of English. While these reference works have made major contributions to our understanding of English lexis and grammar, most of them share a major limitation: the failure to consider register differences. Instead, most reference works describe lexico-grammatical patterns as if they applied generally to English. The main goal of the present paper is to challenge this practice and the underlying assumption that the patterns of lexical-grammatical use in English can be described in general/global terms. Specifically, I argue that descriptions of the average patterns of use in a general corpus do not accurately describe any register. Rather, the patterns of use in speech are dramatically different from the patterns in writing (especially academic writing), and so minimally an adequate description must recognize the two major poles in this continuum (i.e., conversation versus informational written prose). The paper begins by comparing two general corpus approaches to the study of language use: variationist and text-linguistic. Although both approaches can be used to investigate the use of words, grammatical features, and registers, the two approaches differ in their bases: the first gives primacy to each linguistic token, while the second gives primacy to each text. This difference has important consequences for the overall research design, the kinds of variables that can be measured, the statistical techniques that can be applied, and the particular research questions that can be asked. As a result, the importance of register has been more apparent in text linguistic studies than in studies of linguistic variation. The bulk of the paper, then, argues for the importance of register at all linguistic levels: lexical, grammatical, and lexico-grammatical. Analyses comparing conversation and academic writing are discussed for each level, showing how a general ‘average’ description includes some characteristics that are not applicable to one or the other register, while also omitting other important patterns of use found in particular registers

    Corpus linguistics and the study of English grammar

    Get PDF
    This paper describes how corpus-based analyses can be employed for the study of English grammar, with a focus on case studies taken from the Longman Grammar of Spoken and Written English (LGSWE). Two major themes are developed: 1) the kinds of unexpected findings about language use that result from corpus-based investigations, and 2) the importance of register for any descriptive account of linguistic variation. Three case studies are presented: one focusing on the use of words (i.e., the most common verbs in English); the second focusing on the use and distribution of grammatical forms (i.e., the relative frequency of simple, progressive, and perfect aspect in English); and the third describing how lexis and grammatical structure can interact in complex ways (i.e., showing how verbs with the same valency patterns can have strikingly different preferences for particular valencies). In all three cases, the paper argues for the centrality of a register perspective, showing how the patterns of use vary dramatically from one register to another

    CORPUS LINGUISTICS AND THE STUDY OF ENGLISH GRAMMAR

    Get PDF
    This paper describes how corpus-based analyses can be employed for the study of English grammar, with a focus on case studies taken from the Longman Grammar of Spoken and Written English (LGSWE).  Two major themes are developed:  1) the kinds of unexpected findings about language use that result from corpus-based investigations, and 2) the importance of register for any descriptive account of linguistic variation. Three case studies are presented: one focusing on the use of words (i.e., the most common verbs in English); the second focusing on the use and distribution of grammatical forms (i.e., the relative frequency of simple, progressive, and perfect aspect in English); and the third describing how lexis and grammatical structure can interact in complex ways (i.e., showing how verbs with the same valency patterns can have strikingly different preferences for particular valencies). In all three cases, the paper argues for the centrality of a register perspective, showing how the patterns of use vary dramatically from one register to another. Keywords: corpus-based analyses, register, linguistic variation, valency pattern

    Noun phrase modification

    Get PDF

    Developing Linguistic Literacy: Perspectives from Corpus Linguistics and Multi-Dimensional Analysis

    Get PDF
    In their conceptual framework for linguistic literacy development, Ravid & Tolchinsky synthesize research studies from several perspectives. One of these is corpus-based research, which has been used for several large-scale research studies of spoken and written registers over the past 20 years. In this approach, a large, principled collection of natural texts (a \u27corpus\u27) is analysed using computational and interactive techniques, to identify the salient linguistic characteristics of each register or text variety. Three characteristics of corpus-based analysis are particularly important (see Biber, Conrad & Reppen 1998):(1) a special concern for the representativeness of the text sample being analysed, and for the generalizability of fndings; (2) overt recognition of the interactions among linguistic features: the ways in which features co-occur and alternate; (3) a focus on register as the most important parameter of linguistic variation: strong patterns of use in one register often represent only weak patterns in other registers. Corpus studies have documented the linguistic differences among spoken and written registers in English and other languages. Further, by analyzing systematic corpora produced by students at different stages, these same techniques have been used to track the patterns of extended language development associated with literacy. Two major patterns emerge from studies in this research tradition: (1) adult written language is dramatically different from natural conversation; and (2) written language is by no means homogeneous: rather, there are major linguistic differences among written registers. Thus, the developmental acquisition of linguistic literacy requires control over the patterns of register variation, in addition to a mastery of the mechanics of the written mode

    Exploring the role of lexis and grammar for the stable identification of register in an unrestricted corpus of web documents

    Get PDF
    The Internet offers great possibilities for many scientific disciplines that utilize text data. However, the potential of online data can be limited by the lack of information on the genre or register of the documents, as register-whether a text is, e.g., a news article or a recipe-is arguably the most important predictor of linguistic variation (see Biber in Corpus Linguist Linguist Theory 8:9-37, 2012). Despite having received significant attention in recent years, the modeling of online registers has faced a number of challenges, and previous studies have presented contradictory results. In particular, these have concerned (1) the extent to which registers can be automatically identified in a large, unrestricted corpus of web documents and (2) the stability of the models, specifically the kinds of linguistic features that achieve the best performance while reflecting the registers instead of corpus idiosyncrasies. Furthermore, although the linguistic properties of registers vary importantly in a number of ways that may affect their modeling, this variation is often bypassed. In this article, we tackle these issues. We model online registers in the largest available corpus of online registers, the Corpus of Online Registers of English (CORE). Additionally, we evaluate the stability of the models towards corpus idiosyncrasies, analyze the role of different linguistic features in them, and examine how individual registers differ in these two aspects. We show that (1) competitive classification performance on a large-scale, unrestricted corpus can be achieved through a combination of lexico-grammatical features, (2) the inclusion of grammatical information improves the stability of the model, whereas many of the previously best-performing feature sets are less stable, and that (3) registers can be placed in a continuum based on the discriminative importance of lexis and grammar. These register-specific characteristics can explain the variation observed in previous studies concerning the automatic identification of online registers and the importance of different linguistic features for them. Thus, our results offer explanations for the jungle-likeness of online data and provide essential information on online registers for all studies using online data

    Conversational Grammar- Feminine Grammar? A Sociopragmatic Corpus Study

    Get PDF
    One area in language and gender research that has so far received only little attention is the extent to which the sexes make use of what recent corpus research has termed “conversational grammar.” The author’s initial findings have suggested that the majority of features distinctive of conversational grammar may be used predominantly by female speakers. This article reports on a study designed to test the hypothesis that conversational grammar is “feminine grammar” in the sense that women’s conversational language is more adapted to the conversational situation than men’s. Based on data from the conversational subcorpus of the British National Corpus and following the situational framework for the description of conversational features elaborated in the author’s previous research, features distinctive of conversational grammar are grouped into five functional categories and their normed frequencies compared across the sexes. The functional categories distinguish features that can be seen as adaptations to constraints set by the situational factors of (1) Shared Context, (2) Co-Construction, (3) Real-Time Processing, (4) Discourse Management, and (5) Relation Management. The study’s results, described in detail in relation to the biological category of speaker sex and cultural notions of gender, suggest that the feminine grammar hypothesis is valid
    • 

    corecore